Speaker Recognition Based on Fisher Discrimination Dictionary Learning

WANG Wei; HAN Jiqing; ZHENG Tieran; ZHENG Guibin; TAO Yao

doi:10.11999/JEIT 150566

Volume 38 Issue 2

Feb. 2016

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2016 > 38(2): 367-372

WANG Wei, HAN Jiqing, ZHENG Tieran, ZHENG Guibin, TAO Yao. Speaker Recognition Based on Fisher Discrimination Dictionary Learning[J]. Journal of Electronics & Information Technology, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566

Citation:

WANG Wei, HAN Jiqing, ZHENG Tieran, ZHENG Guibin, TAO Yao. Speaker Recognition Based on Fisher Discrimination Dictionary Learning[J]. Journal of Electronics & Information Technology, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566

Citation:

PDF( 398 KB)

Speaker Recognition Based on Fisher Discrimination Dictionary Learning

doi: 10.11999/JEIT 150566 cstr: 32379.14.JEIT 150566

Funds:

The National Natural Science Foundation of China (61071181, 61471145), The Major Research Plan of the National Natural Science Foundation of China (91120303)

Received Date: 2015-05-13
Rev Recd Date: 2015-09-06
Publish Date: 2016-02-19

Abstract

Abstract

Motivated by the success of sparse representation in speaker recognition,?a good?dictionary?plays an important role in?sparse representation. In this paper, the structured dictionary learning is introduced to speaker recognition based on the Fisher criterion. In the process of learning the discrimination dictionary, each sub-dictionary of the learned dictionary corresponds to a class label, so the reconstruction error of the same training samples is small. Meanwhile, the sparse coding coefficients have small with-class scatter and big between-class scatter. On the NIST SRE 2003 database, the experimental results indicate that the proposed method achieves an Equal Error Rate (EER) of 7.62%, and the i-vector system based on cosine distance scoring gives an EER of 6.7%. Moreover, an EER of 5.07% is obtained by combining two systems.
- Speaker recognition,
- Dictionary learning,
- Sparse representation,
- Fisher Discrimination (FD)

FullText(HTML)

References(34)

References

CANDS E. Compressive sampling[C]. Proceedings of the 2nd International Congress of Mathematicians, Spain, 2006: 1433-1452.

CANDS E J, ROMBERG J, and TAO T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information[J]. IEEE Transactions on Information Theory, 2004, 52(2): 489-509.

BARANIUK R. Compressive sensing[J]. IEEE Signal Processing Magazine, 2008, 56(4): 4-5.

丁军, 刘宏伟, 王英华. 基于非负稀疏表示的SAR图像目标识别方法[J]. 电子与信息学报, 2014, 36(9): 2194-2200. doi: 10.3724/SP.J.1146.2013.01451.

DING Jun, LIU Hongwei, and WANG Yinghua. SAR image target recognition based on non-negative sparse representation[J]. Journal of Electronics Information Technology, 2004, 36(9): 2194-2200. doi: 10.3724/SP.J.1146. 2013.01451.

苏伍各, 王宏强, 邓彬, 等. 基于稀疏贝叶斯方法的脉间捷变频ISAR成像技术研究[J]. 电子与信息学报，2015, 37(1): 1-8. doi: 10.11999/JEIT.140315.

SU Wuge, WANG Hongqiang, DENG Bin, et al. The interpulse frequency agility ISAR imaging technology based on sparse bayesian method[J]. Journal of Electronics Information Technology, 2015, 37(1): 1-8. doi: 10.11999/ JEIT.140315.

HUANG K and AVIYENTE S. Sparse Representation for Signal Classification[M]. New York, MIT Press, 2006: 609-616.

MALLAT S. A Wavelet Tour of Signal Processing[M]. Second Edition. New York, Academic Press, 1999: 506-513.

CANDS E J and GUO F. New multiscale transforms, minimum total variation synthesis: Applications to edge-preserving image reconstruction[J]. Signal Processing, 2002, 82(2): 1519-1543.

GABOR D. Theory of communication. Part 1: the analysis of information[J]. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, 1946, 93(26): 429-441.

AHARON M, ELAD M, and BRUCKSTEIN A. The K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311-4322.

MAIRAL J, BACH F, and PONCE J. Online dictionary learning for sparse coding[C]. Proceedings of the 26th Annual International Conference on Machine Learning, Canada, 2009: 689-696.

WANG J, LU C, WANG M, et al. Robust face recognition via adaptive sparse representation[J]. IEEE Transactions on Cybernetics, 2014, 44(12): 2368-2378.

KUA J M K, AMBIKAIRAJAH E, and EPPS J. Speaker verification using sparse representation classification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Czech Republic, 2011: 4548-4551.

LI M, ZHANG X, and YAN Y. Speaker verification using sparse representations on total variability i-vectors[C]. 12th Annual Conference of the International Speech Communication Association (Interspeech), Italy, 2011: 2729-2732.

MAIRAL J, BACH F, and PONCE J. Discriminative learned dictionaries for local image analysis[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008: 1-8.

ZHANG Q and LI B. Discriminative K-SVD for dictionary learning in face recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 2691-2698.

RAMIREZ I, SPRECHMANN P, and SAPIRO G. Classification and clustering via dictionary learning with structured incoherence and shared features[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 3501-3508.

JIANG Z, LIN Z, and DAVIS L S. Label consistent K-SVD: learning a discriminative dictionary for recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2651-2664.

MAIRAL J, PONCE J, and SAPIRO G. Supervised Dictionary Learning[M]. New York, MIT Press, 2009: 1033-1040.

WANG Z, YANG J, NASRABADI N, et al. Look into sparse representation based classification: A margin-based perspective[C]. IEEE International Conference on Computer Vision (ICCV), Sydney, 2013: 759-769.

YANG M, ZHANG L, FENG X C, et al. Sparse representation based fisher discrimination dictionary learning for image classification[J]. International Journal of Computer Vision, 2014, 109(3): 209-232.

RAO W and MAK M W. Boosting the performance of i-vector based speaker verification via utterance partitioning [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(5): 1012-1022.

LIU T T, KANG Kai, and GUAN S X. I-vector based text-independent speaker identification[C]. 11th World Congress on Intelligent Control and Automation (WCICA), Shenyang, 2014: 5420-5425.

DEHAK N, KENNY P, and DEHAK R. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19 (4): 788-798.

DEHAK N, KENNY P, and DEHAK R. Support vector machines and joint factor analysis for speaker verification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taiwan, 2009: 4237-4240.

ROSASCO L, VERRI A, and SANTORO M. Iterative projection methods for structured sparsity regularization[R]. MIT Technical Reports, MIT-CSAIL-TR-2009-050, CBCL-282, 2009.

GU S, ZHANG L, and ZUO W. Projective Dictionary Pair Learning for Pattern Classification[M]. New York, MIT Press, 2014: 793-801.

KENNY P, STAFYLAKIS T, and OUELLET P. PLDA for speaker verification with utterances of arbitrary duration[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, 2013: 7649-7653.

HARIS B C and SINHA R. Sparse representation over learned and discriminatively learned dictionaries for speaker verification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, 2012: 4785-4788.

STAFYLAKIS T, KENNY P, and SENOUSSAOUI M. PLDA using gaussian restricted boltzmann machines with application to speaker verification[C]. 13th Annual Conference of the International Speech Communication Association (Interspeech), Portland, 2012: 1692-1695.

KINNUNEN T and LI H. An overview of text-independent speaker recognition: from features to supervectors[J]. Speech Communication, 2010, 52(1): 12-40.

KANAGASUNDARAM A, DEAN D, SRIDHARAN S, et al. I-vector based speaker recognition using advanced channel compensation techniques[J]. Computer Speech Language, 2014, 28(1): 121-140.

Relative Articles

Supplements(0)

Cited By

Proportional views